S.-T. Yau College Student Mathematics Contests 2022

Probability and Statistics

Solve every problem.

Problem 1. Let { X n } X n {X_(n)}\left\{X_{n}\right\}{Xn} be a sequence of Gaussian random variables. Suppose that X X XXX is a random variable such that X n X n X_(n)X_{n}Xn converges to X X XXX in distribution as n n n rarr oon \rightarrow \inftyn. Show that X X XXX is also a (possibly degenerate, i.e., variance zero) Gaussian random variable.
Solution: Let f n ( t ) = E e i t X n f n ( t ) = E e i t X n f_(n)(t)=Ee^(itX_(n))f_{n}(t)=\mathbb{E} e^{i t X_{n}}fn(t)=EeitXn be the characteristic function of X n X n X_(n)X_{n}Xn and f ( t ) = E e i t X f ( t ) = E e i t X f(t)=Ee^(itX)f(t)=\mathbb{E} e^{i t X}f(t)=EeitX be that of X X XXX. There are real numbers μ n μ n mu_(n)\mu_{n}μn and σ n σ n sigma_(n)\sigma_{n}σn such that f n ( t ) = e i μ n t σ n 2 t 2 / 2 f n ( t ) = e i μ n t σ n 2 t 2 / 2 f_(n)(t)=e^(imu_(n)t-sigma_(n)^(2)t^(2)//2)f_{n}(t)=e^{i \mu_{n} t-\sigma_{n}^{2} t^{2} / 2}fn(t)=eiμntσn2t2/2. We have | f n ( t ) | 2 | f ( t ) | 2 f n ( t ) 2 | f ( t ) | 2 |f_(n)(t)|^(2)rarr|f(t)|^(2)\left|f_{n}(t)\right|^{2} \rightarrow|f(t)|^{2}|fn(t)|2|f(t)|2, hence e σ n 2 t 2 | f ( t ) | 2 e σ n 2 t 2 | f ( t ) | 2 e^(-sigma_(n)^(2)t^(2))rarr|f(t)|^(2)e^{-\sigma_{n}^{2} t^{2}} \rightarrow|f(t)|^{2}eσn2t2|f(t)|2 for all t R t R t inRt \in \mathbf{R}tR. Since f ( t ) 0 f ( t ) 0 f(t)!=0f(t) \neq 0f(t)0 if t t ttt is close to 0 , we must have σ n 2 σ 2 σ n 2 σ 2 sigma_(n)^(2)rarrsigma^(2)\sigma_{n}^{2} \rightarrow \sigma^{2}σn2σ2 for some σ [ 0 , ) σ [ 0 , ) sigma in[0,oo)\sigma \in[0, \infty)σ[0,). Now we have e i μ n t f ( t ) e σ 2 t 2 e i μ n t f ( t ) e σ 2 t 2 e^(imu_(n)t)rarr f(t)e^(sigma^(2)t^(2))e^{i \mu_{n} t} \rightarrow f(t) e^{\sigma^{2} t^{2}}eiμntf(t)eσ2t2 for all t R t R t inRt \in \mathbf{R}tR and by the dominated convergence theorem,
lim n 0 t e i μ n s d s = 0 t f ( s ) e σ 2 s 2 / 2 d s lim n 0 t e i μ n s d s = 0 t f ( s ) e σ 2 s 2 / 2 d s lim_(n rarr oo)int_(0)^(t)e^(imu_(n)s)ds=int_(0)^(t)f(s)e^(sigma^(2)s^(2)//2)ds\lim _{n \rightarrow \infty} \int_{0}^{t} e^{i \mu_{n} s} d s=\int_{0}^{t} f(s) e^{\sigma^{2} s^{2} / 2} d slimn0teiμnsds=0tf(s)eσ2s2/2ds
The integral on the right side does not vanish if t t ttt is close, but not equal to, 0 because the integrand is countinuous and equal to 1 at s = 0 s = 0 s=0s=0s=0. On the other hand,
i μ n 0 t e i μ n s d s = e i μ n t 1 i μ n 0 t e i μ n s d s = e i μ n t 1 imu_(n)int_(0)^(t)e^(imu_(n)s)ds=e^(imu_(n)t)-1i \mu_{n} \int_{0}^{t} e^{i \mu_{n} s} d s=e^{i \mu_{n} t}-1iμn0teiμnsds=eiμnt1
This gives
μ n = i ( f n ( t ) e σ n 2 t 2 / 2 1 ) ( 0 t e i μ n s d s ) 1 μ n = i f n ( t ) e σ n 2 t 2 / 2 1 0 t e i μ n s d s 1 mu_(n)=-i(f_(n)(t)e^(sigma_(n)^(2)t^(2)//2)-1)(int_(0)^(t)e^(imu_(n)s)ds)^(-1)\mu_{n}=-i\left(f_{n}(t) e^{\sigma_{n}^{2} t^{2} / 2}-1\right)\left(\int_{0}^{t} e^{i \mu_{n} s} d s\right)^{-1}μn=i(fn(t)eσn2t2/21)(0teiμnsds)1
from which we see that that μ n μ n mu_(n)\mu_{n}μn must converges to a finite number μ μ mu\muμ. Finally,
f n ( t ) e i μ t σ 2 t 2 / 2 = f ( t ) f n ( t ) e i μ t σ 2 t 2 / 2 = f ( t ) f_(n)(t)rarre^(i mu t-sigma^(2)t^(2)//2)=f(t)f_{n}(t) \rightarrow e^{i \mu t-\sigma^{2} t^{2} / 2}=f(t)fn(t)eiμtσ2t2/2=f(t)
and X X XXX must be a (possibly denegerate) Gaussian random variable.
Problem 2. For two probability measures μ μ mu\muμ and ν ν nu\nuν on the real line R R R\mathbf{R}R, the total variation distance μ ν T V μ ν T V ||mu-nu||_(TV)\|\mu-\nu\|_{T V}μνTV is defined as
μ ν T V = sup { μ ( C ) ν ( C ) : C B ( R ) } μ ν T V = sup { μ ( C ) ν ( C ) : C B ( R ) } ||mu-nu||_(TV)=s u p{mu(C)-nu(C):C inB(R)}\|\mu-\nu\|_{T V}=\sup \{\mu(C)-\nu(C): C \in \mathcal{B}(\mathbf{R})\}μνTV=sup{μ(C)ν(C):CB(R)}
where B ( R ) B ( R ) B(R)\mathcal{B}(\mathbf{R})B(R) is the σ σ sigma\sigmaσ-algebra of Borel sets on R R R\mathbf{R}R. Let C ( μ , ν ) C ( μ , ν ) C(mu,nu)\mathcal{C}(\mu, \nu)C(μ,ν) be the space of couplings of the probability measures μ μ mu\muμ and ν ν nu\nuν, i.e., the space of R 2 R 2 R^(2)\mathbf{R}^{2}R2 valued random variables ( X , Y ) ( X , Y ) (X,Y)(X, Y)(X,Y) defined on some (not necessarily same) probability space ( Ω , F , P ) ( Ω , F , P ) (Omega,F,P)(\Omega, \mathcal{F}, \mathbb{P})(Ω,F,P) such that the marginal distributions of X X XXX and Y Y YYY are μ μ mu\muμ and ν ν nu\nuν, respectively. Show that
μ ν T V = inf { P ( X Y ) : ( X , Y ) C ( μ , ν ) } μ ν T V = inf { P ( X Y ) : ( X , Y ) C ( μ , ν ) } ||mu-nu||_(TV)=i n f{P(X!=Y):(X,Y)inC(mu,nu)}\|\mu-\nu\|_{T V}=\inf \{\mathbb{P}(X \neq Y):(X, Y) \in \mathcal{C}(\mu, \nu)\}μνTV=inf{P(XY):(X,Y)C(μ,ν)}
For simplicity you may assume that μ μ mu\muμ and ν ν nu\nuν are absolutely continuous with respect to the Lebesgue measure on R.
Solution: (1) Let C B ( R ) C B ( R ) C inB(R)C \in \mathcal{B}(\mathbf{R})CB(R) and ( X , Y ) C ( μ , ν ) ( X , Y ) C ( μ , ν ) (X,Y)inC(mu,nu)(X, Y) \in \mathcal{C}(\mu, \nu)(X,Y)C(μ,ν). Then
μ ( C ) ν ( C ) = P { X C } P { Y C } P { X C , Y C } P { X Y } μ ( C ) ν ( C ) = P { X C } P { Y C } P { X C , Y C } P { X Y } mu(C)-nu(C)=P{X in C}-P{Y in C} <= P{X in C,Y!in C} <= P{X!=Y}\mu(C)-\nu(C)=\mathbb{P}\{X \in C\}-\mathbb{P}\{Y \in C\} \leq \mathbb{P}\{X \in C, Y \notin C\} \leq \mathbb{P}\{X \neq Y\}μ(C)ν(C)=P{XC}P{YC}P{XC,YC}P{XY}
Taking the supremum over C B ( R ) C B ( R ) C inB(R)C \in \mathcal{B}(\mathbf{R})CB(R) and then the infimum over ( X , Y ) C ( μ , ν ) ( X , Y ) C ( μ , ν ) (X,Y)inC(mu,nu)(X, Y) \in \mathcal{C}(\mu, \nu)(X,Y)C(μ,ν) we obtain
μ ν T V inf { P { ( X Y } : ( X , Y ) C ( μ , ν ) } μ ν T V inf { P { ( X Y } : ( X , Y ) C ( μ , ν ) } ||mu-nu||_(TV) <= i n f{P{(X!=Y}:(X,Y)inC(mu,nu)}\|\mu-\nu\|_{T V} \leq \inf \{\mathbb{P}\{(X \neq Y\}:(X, Y) \in \mathcal{C}(\mu, \nu)\}μνTVinf{P{(XY}:(X,Y)C(μ,ν)}
(2) It is sufficient to a probability measure P C ( μ , ν ) P C ( μ , ν ) PinC(mu,nu)\mathbb{P} \in \mathcal{C}(\mu, \nu)PC(μ,ν) and a set C B ( R ) C B ( R ) C inB(R)C \in \mathcal{B}(\mathbf{R})CB(R) such that for ( X , Y ) R 2 ( X , Y ) R 2 (X,Y)inR^(2)(X, Y) \in \mathbf{R}^{2}(X,Y)R2 under this probability,
μ ( C ) v ( C ) = P { X Y } μ ( C ) v ( C ) = P { X Y } mu(C)-v(C)=P{X!=Y}\mu(C)-v(C)=\mathbb{P}\{X \neq Y\}μ(C)v(C)=P{XY}
The idea is to construct P P P\mathbb{P}P such that the probability P { X = Y } P { X = Y } P{X=Y}\mathbb{P}\{X=Y\}P{X=Y} is the largest possible under the condition that ( X , Y ) C ( μ , ν ) ( X , Y ) C ( μ , ν ) (X,Y)inC(mu,nu)(X, Y) \in \mathcal{C}(\mu, \nu)(X,Y)C(μ,ν). Let m = μ + ν m = μ + ν m=mu+num=\mu+\num=μ+ν, or just take m m mmm to be the Lebesgue measure if μ μ mu\muμ and ν ν nu\nuν are absolutely continuous with respect to m m mmm. We have μ = f 1 m μ = f 1 m mu=f_(1)*m\mu=f_{1} \cdot mμ=f1m and ν = f 2 m ν = f 2 m nu=f_(2)*m\nu=f_{2} \cdot mν=f2m by the Radon-Nikodym theorem. Let f = min { f 1 , f 2 } = f 1 f 2 f = min f 1 , f 2 = f 1 f 2 f=min{f_(1),f_(2)}=f_(1)^^f_(2)f=\min \left\{f_{1}, f_{2}\right\}=f_{1} \wedge f_{2}f=min{f1,f2}=f1f2. Define a probability measure P P P\mathbb{P}P on R 2 R 2 R^(2)\mathbf{R}^{2}R2 by
P { ( X , Y ) A × B } = 1 1 a A × B ( f 1 ( x ) f ( x ) ) ( f 2 ( y ) f ( y ) ) m ( d x ) m ( d y ) + A B f ( z ) m ( d z ) P { ( X , Y ) A × B } = 1 1 a A × B f 1 ( x ) f ( x ) f 2 ( y ) f ( y ) m ( d x ) m ( d y ) + A B f ( z ) m ( d z ) P{(X,Y)in A xx B}=(1)/(1-a)int_(A xx B)(f_(1)(x)-f(x))(f_(2)(y)-f(y))m(dx)m(dy)+int_(A nn B)f(z)m(dz)\mathbb{P}\{(X, Y) \in A \times B\}=\frac{1}{1-a} \int_{A \times B}\left(f_{1}(x)-f(x)\right)\left(f_{2}(y)-f(y)\right) m(d x) m(d y)+\int_{A \cap B} f(z) m(d z)P{(X,Y)A×B}=11aA×B(f1(x)f(x))(f2(y)f(y))m(dx)m(dy)+ABf(z)m(dz)
Here a = R f ( z ) m ( d z ) a = R f ( z ) m ( d z ) a=int_(R)f(z)m(dz)a=\int_{\mathbf{R}} f(z) m(d z)a=Rf(z)m(dz) and we assume that a < 1 a < 1 a < 1a<1a<1; otherwise a = 1 a = 1 a=1a=1a=1 and f 1 = f 2 f 1 = f 2 f_(1)=f_(2)f_{1}=f_{2}f1=f2, and the case is trivial. Note that the first part is the product measure of ( f 1 f ) m f 1 f m (f_(1)-f)*m\left(f_{1}-f\right) \cdot m(f1f)m and ( f 2 f ) m f 2 f m (f_(2)-f)*m\left(f_{2}-f\right) \cdot m(f2f)m ) (up to a constant) and the second part is the probability measure f m f m f*mf \cdot mfm on the diagonal (identified with R R R\mathbf{R}R ) of R 2 R 2 R^(2)\mathbf{R}^{2}R2. We have
P { X A } = A ( f 1 ( x ) f ( x ) ) m ( d x ) + A f ( z ) m ( d z ) = A f 1 ( x ) m ( d x ) = μ ( A ) . P { X A } = A f 1 ( x ) f ( x ) m ( d x ) + A f ( z ) m ( d z ) = A f 1 ( x ) m ( d x ) = μ ( A ) . P{X in A}=int_(A)(f_(1)(x)-f(x))m(dx)+int_(A)f(z)m(dz)=int_(A)f_(1)(x)m(dx)=mu(A).\mathbb{P}\{X \in A\}=\int_{A}\left(f_{1}(x)-f(x)\right) m(d x)+\int_{A} f(z) m(d z)=\int_{A} f_{1}(x) m(d x)=\mu(A) .P{XA}=A(f1(x)f(x))m(dx)+Af(z)m(dz)=Af1(x)m(dx)=μ(A).
Similarly P { Y B } = ν ( B ) P { Y B } = ν ( B ) P{Y in B}=nu(B)\mathbb{P}\{Y \in B\}=\nu(B)P{YB}=ν(B), hence ( X , Y ) C ( μ , ν ) ( X , Y ) C ( μ , ν ) (X,Y)inC(mu,nu)(X, Y) \in \mathcal{C}(\mu, \nu)(X,Y)C(μ,ν). On the other hand,
P { X Y } = R ( f 1 ( x ) f ( x ) ) m ( d x ) = 1 a P { X Y } = R f 1 ( x ) f ( x ) m ( d x ) = 1 a P{X!=Y}=int_(R)(f_(1)(x)-f(x))m(dx)=1-a\mathbb{P}\{X \neq Y\}=\int_{\mathbf{R}}\left(f_{1}(x)-f(x)\right) m(d x)=1-aP{XY}=R(f1(x)f(x))m(dx)=1a
If we choose C = { f 1 > f 2 } C = f 1 > f 2 C={f_(1) > f_(2)}C=\left\{f_{1}>f_{2}\right\}C={f1>f2}, then
μ ( C ) v ( C ) = C ( f 1 ( x ) f 2 ( x ) ) m ( d x ) = R ( f 1 ( x ) f ( x ) ) m ( d x ) = 1 a μ ( C ) v ( C ) = C f 1 ( x ) f 2 ( x ) m ( d x ) = R f 1 ( x ) f ( x ) m ( d x ) = 1 a mu(C)-v(C)=int_(C)(f_(1)(x)-f_(2)(x))m(dx)=int_(R)(f_(1)(x)-f(x))m(dx)=1-a\mu(C)-v(C)=\int_{C}\left(f_{1}(x)-f_{2}(x)\right) m(d x)=\int_{\mathbf{R}}\left(f_{1}(x)-f(x)\right) m(d x)=1-aμ(C)v(C)=C(f1(x)f2(x))m(dx)=R(f1(x)f(x))m(dx)=1a
This shows that μ ( C ) ν ( C ) = P { X Y } μ ( C ) ν ( C ) = P { X Y } mu(C)-nu(C)=P{X!=Y}\mu(C)-\nu(C)=\mathbb{P}\{X \neq Y\}μ(C)ν(C)=P{XY}.
Problem 3. We throw a fair die repeatedly and independently. Let τ 11 τ 11 tau_(11)\tau_{11}τ11 be the first time the pattern 11 (two consecutive 1 's) appears and τ 12 τ 12 tau_(12)\tau_{12}τ12 the first time the pattern 12 ( 1 followed by 2 ) appears.
(a) Calculate the expected value E τ 11 E τ 11 Etau_(11)\mathbb{E} \tau_{11}Eτ11.
(b) Which is larger, E τ 11 E τ 11 Etau_(11)\mathbb{E} \tau_{11}Eτ11 or E τ 12 E τ 12 Etau_(12)\mathbb{E} \tau_{12}Eτ12 ? It is sufficient to give an intuitive argument to justify your answer. You can also calculate E τ 12 E τ 12 Etau_(12)\mathbb{E} \tau_{12}Eτ12 if you wish.

Solution:

(a) Let τ 1 τ 1 tau_(1)\tau_{1}τ1 be the first time the digit 1 appears. At this time, if the next result is 1 , then τ 11 = τ 1 + 1 τ 11 = τ 1 + 1 tau_(11)=tau_(1)+1\tau_{11}=\tau_{1}+1τ11=τ1+1; if the next result is not 1 , then the time is τ 1 + 1 τ 1 + 1 tau_(1)+1\tau_{1}+1τ1+1 and we have to start all over again. This means
E τ 11 = 1 6 { E τ 1 + 1 } + 5 6 { E τ 1 + 1 + E τ 11 } E τ 11 = 1 6 E τ 1 + 1 + 5 6 E τ 1 + 1 + E τ 11 Etau_(11)=(1)/(6)*{Etau_(1)+1}+(5)/(6)*{Etau_(1)+1+Etau_(11)}\mathbb{E} \tau_{11}=\frac{1}{6} \cdot\left\{\mathbb{E} \tau_{1}+1\right\}+\frac{5}{6} \cdot\left\{\mathbb{E} \tau_{1}+1+\mathbb{E} \tau_{11}\right\}Eτ11=16{Eτ1+1}+56{Eτ1+1+Eτ11}
Solving for E τ 11 E τ 11 Etau_(11)\mathbb{E} \tau_{11}Eτ11 we have E τ 11 = 6 ( E τ 1 + 1 ) E τ 11 = 6 E τ 1 + 1 Etau_(11)=6(Etau_(1)+1)\mathbb{E} \tau_{11}=6\left(\mathbb{E} \tau_{1}+1\right)Eτ11=6(Eτ1+1). We need to calculate E τ 1 E τ 1 Etau_(1)\mathbb{E} \tau_{1}Eτ1. The set { τ 1 n } τ 1 n {tau_(1) >= n}\left\{\tau_{1} \geq n\right\}{τ1n} is the event that that none of the first n 1 n 1 n-1n-1n1 results is 1 , hence { τ 1 n } = ( 5 / 6 ) n 1 τ 1 n = ( 5 / 6 ) n 1 ∓{tau_(1) >= n}=(5//6)^(n-1)\mp\left\{\tau_{1} \geq n\right\}=(5 / 6)^{n-1}{τ1n}=(5/6)n1 and
E τ 1 = n = 1 { τ 1 n } = n = 1 ( 5 6 ) n 1 = 6 E τ 1 = n = 1 τ 1 n = n = 1 5 6 n 1 = 6 Etau_(1)=sum_(n=1)^(oo)∓{tau_(1) >= n}=sum_(n=1)^(oo)((5)/(6))^(n-1)=6\mathbb{E} \tau_{1}=\sum_{n=1}^{\infty} \mp\left\{\tau_{1} \geq n\right\}=\sum_{n=1}^{\infty}\left(\frac{5}{6}\right)^{n-1}=6Eτ1=n=1{τ1n}=n=1(56)n1=6
It follows that E τ 11 = 6 ( 6 + 1 ) = 42 E τ 11 = 6 ( 6 + 1 ) = 42 Etau_(11)=6(6+1)=42\mathbb{E} \tau_{11}=6(6+1)=42Eτ11=6(6+1)=42.
(b) For either 11 or 12 to occur, we have to wait until the first 1 occurs. After that, if we want 11, the next digit needs to be 1 ; otherwise we have to start all over again (i.e., waiting for the next 1 ). But if we want 12 , the next digit needs to be 2 ; otherwise, we have to start all over again only if the next digit is 3 to 6 because if the next digit is 1 , we have already have a start on the pattern 12. It follows that the pattern 12 has a slight advantage to occur earlier than 11. Thus we have E τ 12 E τ 11 E τ 12 E τ 11 Etau_(12) <= Etau_(11)\mathbb{E} \tau_{12} \leq \mathbb{E} \tau_{11}Eτ12Eτ11.
We can also calculate E τ 12 E τ 12 Etau_(12)\mathbb{E} \tau_{12}Eτ12 directly. Let τ 1 τ 1 tau_(1)\tau_{1}τ1 be as before and let σ σ sigma\sigmaσ be the first time a digit not equal to 1 appears. After τ 1 τ 1 tau_(1)\tau_{1}τ1 we wait until the first time a digit not equal to 1 appears. With probability 1 / 5 1 / 5 1//51 / 51/5 this digit is 2 ; with probability 4 / 5 4 / 5 4//54 / 54/5 this probability is not 2 , then we have to start over again. This means that
E τ 12 = 1 5 { E ( τ 1 + σ ) } + 4 5 { E ( τ 1 + σ ) + E τ 12 } E τ 12 = 1 5 E τ 1 + σ + 4 5 E τ 1 + σ + E τ 12 Etau_(12)=(1)/(5)*{E(tau_(1)+sigma)}+(4)/(5)*{E(tau_(1)+sigma)+Etau_(12)}\mathbb{E} \tau_{12}=\frac{1}{5} \cdot\left\{\mathbb{E}\left(\tau_{1}+\sigma\right)\right\}+\frac{4}{5} \cdot\left\{\mathbb{E}\left(\tau_{1}+\sigma\right)+\mathbb{E} \tau_{12}\right\}Eτ12=15{E(τ1+σ)}+45{E(τ1+σ)+Eτ12}
Hence E τ 12 = 5 E ( τ 1 + σ ) E τ 12 = 5 E τ 1 + σ Etau_(12)=5E(tau_(1)+sigma)\mathbb{E} \tau_{12}=5 \mathbb{E}\left(\tau_{1}+\sigma\right)Eτ12=5E(τ1+σ). We have seen E τ 1 = 6 E τ 1 = 6 Etau_(1)=6\mathbb{E} \tau_{1}=6Eτ1=6. On the other hand, { σ n } { σ n } {sigma >= n}\{\sigma \geq n\}{σn} is the event that the first n 1 n 1 n-1n-1n1 digits are 1 , hence { σ n } = ( 1 / 6 ) n 1 { σ n } = ( 1 / 6 ) n 1 ∓{sigma >= n}=(1//6)^(n-1)\mp\{\sigma \geq n\}=(1 / 6)^{n-1}{σn}=(1/6)n1 and E σ = 6 / 5 E σ = 6 / 5 Esigma=6//5\mathbb{E} \sigma=6 / 5Eσ=6/5. It follows that
E τ 12 = 5 ( 6 + 6 5 ) = 36 E τ 12 = 5 6 + 6 5 = 36 Etau_(12)=5(6+(6)/(5))=36\mathbb{E} \tau_{12}=5\left(6+\frac{6}{5}\right)=36Eτ12=5(6+65)=36
Problem 4. Let { X n } X n {X_(n)}\left\{X_{n}\right\}{Xn} be a Markov chain on a discrete state space S S SSS with transition function p ( x , y ) , x , y S p ( x , y ) , x , y S p(x,y),x,y in Sp(x, y), x, y \in Sp(x,y),x,yS. Suppose that there is a state y 0 S y 0 S y_(0)in Sy_{0} \in Sy0S and a positive number θ θ theta\thetaθ such that p ( x , y 0 ) θ p x , y 0 θ p(x,y_(0)) >= thetap\left(x, y_{0}\right) \geq \thetap(x,y0)θ for all x S x S x in Sx \in SxS.
(a) Show that is a positive constant λ < 1 λ < 1 lambda < 1\lambda<1λ<1 such that for any two initial distribution μ μ mu\muμ and ν ν nu\nuν,
y S | P μ { X 1 = y } P ν { X 1 = y } | λ y S | μ ( y ) ν ( y ) | y S P μ X 1 = y P ν X 1 = y λ y S | μ ( y ) ν ( y ) | sum_(y in S)|P_(mu){X_(1)=y}-P_(nu){X_(1)=y}| <= lambdasum_(y in S)|mu(y)-nu(y)|\sum_{y \in S}\left|\mathbb{P}_{\mu}\left\{X_{1}=y\right\}-\mathbb{P}_{\nu}\left\{X_{1}=y\right\}\right| \leq \lambda \sum_{y \in S}|\mu(y)-\nu(y)|yS|Pμ{X1=y}Pν{X1=y}|λyS|μ(y)ν(y)|
(b) Show that the Markov chain has a unique stationary distribution π π pi\piπ and
y S | P μ { X n = y } π ( y ) | 2 λ n y S P μ X n = y π ( y ) 2 λ n sum_(y in S)|P_(mu){X_(n)=y}-pi(y)| <= 2lambda^(n)\sum_{y \in S}\left|\mathbb{P}_{\mu}\left\{X_{n}=y\right\}-\pi(y)\right| \leq 2 \lambda^{n}yS|Pμ{Xn=y}π(y)|2λn

Solution:

(a) Let θ = min { p ( x , y 0 ) : x S } θ = min p x , y 0 : x S theta=min{p(x,y_(0)):x in S}\theta=\min \left\{p\left(x, y_{0}\right): x \in S\right\}θ=min{p(x,y0):xS}. Then 0 < θ 1 0 < θ 1 0 < theta <= 10<\theta \leq 10<θ1. For any two probability meausres μ μ mu\muμ and ν ν nu\nuν on the state space S S SSS, we have
y S | P μ { X 1 = y } P ν { X 1 = y } | = y S | x S { μ ( x ) ν ( x ) } p ( x , y ) | y S P μ X 1 = y P ν X 1 = y = y S x S { μ ( x ) ν ( x ) } p ( x , y ) sum_(y in S)|P_(mu){X_(1)=y}-P_(nu){X_(1)=y}|=sum_(y in S)|sum_(x in S){mu(x)-nu(x)}p(x,y)|\sum_{y \in S}\left|\mathbb{P}_{\mu}\left\{X_{1}=y\right\}-\mathbb{P}_{\nu}\left\{X_{1}=y\right\}\right|=\sum_{y \in S}\left|\sum_{x \in S}\{\mu(x)-\nu(x)\} p(x, y)\right|yS|Pμ{X1=y}Pν{X1=y}|=yS|xS{μ(x)ν(x)}p(x,y)|
For the term y = y 0 y = y 0 y=y_(0)y=y_{0}y=y0 we can replace p ( x , y 0 ) p x , y 0 p(x,y_(0))p\left(x, y_{0}\right)p(x,y0) by p ( x , y 0 ) θ p x , y 0 θ p(x,y_(0))-thetap\left(x, y_{0}\right)-\thetap(x,y0)θ because x S { μ ( x ) ν ( x ) } = 1 1 = 0 x S { μ ( x ) ν ( x ) } = 1 1 = 0 sum_(x in S){mu(x)-nu(x)}=1-1=0\sum_{x \in S}\{\mu(x)-\nu(x)\}=1-1=0xS{μ(x)ν(x)}=11=0. After this replacement, we take the absolute value of every term and exchange the order of summation. Using the fact that p ( x , y 0 ) θ 0 p x , y 0 θ 0 p(x,y_(0))-theta >= 0p\left(x, y_{0}\right)-\theta \geq 0p(x,y0)θ0 we have
y S | P μ { X 1 = y } P ν { X 1 = y } | [ y S p ( x , y ) θ ] x S | μ ( x ) ν ( x ) | y S P μ X 1 = y P ν X 1 = y y S p ( x , y ) θ x S | μ ( x ) ν ( x ) | sum_(y in S)|P_(mu){X_(1)=y}-P_(nu){X_(1)=y}| <= [sum_(y in S)p(x,y)-theta]*sum_(x in S)|mu(x)-nu(x)|\sum_{y \in S}\left|\mathbb{P}_{\mu}\left\{X_{1}=y\right\}-\mathbb{P}_{\nu}\left\{X_{1}=y\right\}\right| \leq\left[\sum_{y \in S} p(x, y)-\theta\right] \cdot \sum_{x \in S}|\mu(x)-\nu(x)|yS|Pμ{X1=y}Pν{X1=y}|[ySp(x,y)θ]xS|μ(x)ν(x)|
The first sum on the right side is 1 θ = λ < 1 1 θ = λ < 1 1-theta=lambda < 11-\theta=\lambda<11θ=λ<1. It follows that
y S | P μ { X 1 = y } P ν { X 1 = y } | λ x S | μ ( x ) ν ( x ) | y S P μ X 1 = y P ν X 1 = y λ x S | μ ( x ) ν ( x ) | sum_(y in S)|P_(mu){X_(1)=y}-P_(nu){X_(1)=y}| <= lambdasum_(x in S)|mu(x)-nu(x)|\sum_{y \in S}\left|\mathbb{P}_{\mu}\left\{X_{1}=y\right\}-\mathbb{P}_{\nu}\left\{X_{1}=y\right\}\right| \leq \lambda \sum_{x \in S}|\mu(x)-\nu(x)|yS|Pμ{X1=y}Pν{X1=y}|λxS|μ(x)ν(x)|
(b) Let μ n ( x ) = P μ { X n = x } μ n ( x ) = P μ X n = x mu_(n)(x)=P_(mu){X_(n)=x}\mu_{n}(x)=\mathbb{P}_{\mu}\left\{X_{n}=x\right\}μn(x)=Pμ{Xn=x}. Then μ n + 1 = P μ n { X 1 = x } μ n + 1 = P μ n X 1 = x mu_(n+1)=P_(mu_(n)){X_(1)=x}\mu_{n+1}=\mathbb{P}_{\mu_{n}}\left\{X_{1}=x\right\}μn+1=Pμn{X1=x} and μ n = P μ n 1 { X 1 = x } μ n = P μ n 1 X 1 = x mu_(n)=P_(mu_(n-1)){X_(1)=x}\mu_{n}=\mathbb{P}_{\mu_{n-1}}\left\{X_{1}=x\right\}μn=Pμn1{X1=x}. By (a),
x S | μ n + 1 ( x ) μ n ( x ) | λ x S | μ n ( x ) μ n 1 ( x ) | x S μ n + 1 ( x ) μ n ( x ) λ x S μ n ( x ) μ n 1 ( x ) sum_(x in S)|mu_(n+1)(x)-mu_(n)(x)| <= lambdasum_(x in S)|mu_(n)(x)-mu_(n-1)(x)|\sum_{x \in S}\left|\mu_{n+1}(x)-\mu_{n}(x)\right| \leq \lambda \sum_{x \in S}\left|\mu_{n}(x)-\mu_{n-1}(x)\right|xS|μn+1(x)μn(x)|λxS|μn(x)μn1(x)|
It follows that
x S | μ n + 1 ( x ) μ n ( x ) | λ n x S | μ 1 ( x ) μ ( x ) | 2 λ n x S μ n + 1 ( x ) μ n ( x ) λ n x S μ 1 ( x ) μ ( x ) 2 λ n sum_(x in S)|mu_(n+1)(x)-mu_(n)(x)| <= lambda^(n)sum_(x in S)|mu_(1)(x)-mu(x)| <= 2lambda^(n)\sum_{x \in S}\left|\mu_{n+1}(x)-\mu_{n}(x)\right| \leq \lambda^{n} \sum_{x \in S}\left|\mu_{1}(x)-\mu(x)\right| \leq 2 \lambda^{n}xS|μn+1(x)μn(x)|λnxS|μ1(x)μ(x)|2λn
Since 0 λ < 1 0 λ < 1 0 <= lambda < 10 \leq \lambda<10λ<1, the distributions μ n μ n mu_(n)\mu_{n}μn converges to a distribution π π pi\piπ, which is obviously stationary. We have by the same argument,
y S | P μ { X n = y } π ( y ) | = y S | P μ { X n = y } P π { X n = y } | 2 λ n y S P μ X n = y π ( y ) = y S P μ X n = y P π X n = y 2 λ n sum_(y in S)|P_(mu){X_(n)=y}-pi(y)|=sum_(y in S)|P_(mu){X_(n)=y}-P_(pi){X_(n)=y}| <= 2lambda^(n)\sum_{y \in S}\left|\mathbb{P}_{\mu}\left\{X_{n}=y\right\}-\pi(y)\right|=\sum_{y \in S}\left|\mathbb{P}_{\mu}\left\{X_{n}=y\right\}-\mathbb{P}_{\pi}\left\{X_{n}=y\right\}\right| \leq 2 \lambda^{n}yS|Pμ{Xn=y}π(y)|=yS|Pμ{Xn=y}Pπ{Xn=y}|2λn
If σ σ sigma\sigmaσ is another stationary distribution, then
y S | σ ( y ) π ( y ) | = y S | P σ { X n = y } P π { X n = y } | 2 λ n 0 y S | σ ( y ) π ( y ) | = y S P σ X n = y P π X n = y 2 λ n 0 sum_(y in S)|sigma(y)-pi(y)|=sum_(y in S)|P_(sigma){X_(n)=y}-P_(pi){X_(n)=y}| <= 2lambda^(n)longrightarrow0\sum_{y \in S}|\sigma(y)-\pi(y)|=\sum_{y \in S}\left|\mathbb{P}_{\sigma}\left\{X_{n}=y\right\}-\mathbb{P}_{\pi}\left\{X_{n}=y\right\}\right| \leq 2 \lambda^{n} \longrightarrow 0yS|σ(y)π(y)|=yS|Pσ{Xn=y}Pπ{Xn=y}|2λn0
Hence a stationary distribtuion of the Markov chain must be unique.
Problem 5. Consider a linear regression model with p p ppp predictors and n n nnn observations:
Y = X β + e Y = X β + e Y=X beta+e\mathbf{Y}=X \beta+\mathbf{e}Y=Xβ+e
where X n × p X n × p X_(n xx p)X_{n \times p}Xn×p is the design matrix, β β beta\betaβ is the unknown coefficient vector, and the random error vector e has a multivariate normal distribution with mean zero and Var ( e ) = σ 2 I n ( σ 2 > 0 Var ( e ) = σ 2 I n σ 2 > 0 Var(e)=sigma^(2)I_(n)(sigma^(2) > 0:}\operatorname{Var}(\mathbf{e})=\sigma^{2} I_{n}\left(\sigma^{2}>0\right.Var(e)=σ2In(σ2>0 unknown and I n I n I_(n)I_{n}In is the identity matrix). Here rank ( X ) = k p , p rank ( X ) = k p , p rank(X)=k <= p,p\operatorname{rank}(X)=k \leq p, prank(X)=kp,p may or may not be greater than n n nnn, but we assume n k > 1 n k > 1 n-k > 1n-k>1nk>1. Let x 1 = ( x 1 , 1 , , x 1 , p ) x 1 = x 1 , 1 , , x 1 , p x_(1)=(x_(1,1),dots,x_(1,p))\mathbf{x}_{1}=\left(x_{1,1}, \ldots, x_{1, p}\right)x1=(x1,1,,x1,p) be the first row of X X XXX and define
γ = x 1 β σ γ = x 1 β σ gamma=(x_(1)beta)/(sigma)\gamma=\frac{\mathbf{x}_{1} \beta}{\sigma}γ=x1βσ
Find the uniformly minimum variance unbiased estimator (UMVUE) of γ γ gamma\gammaγ or prove it does not exist.
Solution: The key points in the solution are the following.
(i) Any least squares estimator, say β ^ β ^ hat(beta)\hat{\beta}β^, of β β beta\betaβ is independent of σ ^ 2 = Y X β ^ 2 / ( n k ) σ ^ 2 = Y X β ^ 2 / ( n k ) hat(sigma)^(2)=||Y-X hat(beta)||^(2)//(n-k)\hat{\sigma}^{2}=\|\mathbf{Y}-X \hat{\beta}\|^{2} /(n-k)σ^2=YXβ^2/(nk).
(ii) x 1 β x 1 β x_(1)beta\mathrm{x}_{1} \betax1β is clearly estimable.
(iii) Based on (i) and (ii), we can constructor an unbiased estimator, say γ ^ γ ^ hat(gamma)\hat{\gamma}γ^, of γ γ gamma\gammaγ in terms of β ^ β ^ hat(beta)\hat{\beta}β^ and σ ^ 2 σ ^ 2 hat(sigma)^(2)\hat{\sigma}^{2}σ^2, and consequently we know the estimator is a function of X T Y X T Y X^(T)YX^{T} \mathbf{Y}XTY and Y X β ^ 2 Y X β ^ 2 ||Y-X hat(beta)||^(2)\|\mathbf{Y}-X \hat{\beta}\|^{2}YXβ^2.
(iv) In fact, ( X T Y , Y X β ^ 2 ) X T Y , Y X β ^ 2 (X^(T)Y,||Y-X( hat(beta))||^(2))\left(X^{T} \mathbf{Y},\|\mathbf{Y}-X \hat{\beta}\|^{2}\right)(XTY,YXβ^2) is a complete and sufficient statistic and we conclude γ ^ γ ^ hat(gamma)\hat{\gamma}γ^ is the UMVUE of γ γ gamma\gammaγ. More details are given below.
Let β ^ = ( X T X ) X T Y β ^ = X T X X T Y hat(beta)=(X^(T)X)^(-)X^(T)Y\hat{\beta}=\left(X^{T} X\right)^{-} X^{T} Yβ^=(XTX)XTY be a least squares estimator of β β beta\betaβ, where ( X T X ) X T X (X^(T)X)^(-)\left(X^{T} X\right)^{-}(XTX)denotes any generalized inverse of X T X X T X X^(T)XX^{T} XXTX. Let θ = x 1 β θ = x 1 β theta=x_(1)beta\theta=\mathbf{x}_{1} \betaθ=x1β, which is clearly estimable. By Gauss-Markov Theorem, we know θ ^ =: x 1 β ^ θ ^ =: x 1 β ^ hat(theta)=:x_(1) hat(beta)\hat{\theta}=: \mathbf{x}_{1} \hat{\beta}θ^=:x1β^ is the best linear unbiased estimator of θ θ theta\thetaθ. For the unbiased estimator σ ^ 2 = Y Y ^ 2 / ( n k ) σ ^ 2 = Y Y ^ 2 / ( n k ) hat(sigma)^(2)=||Y- hat(Y)||^(2)//(n-k)\hat{\sigma}^{2}=\|\mathbf{Y}-\hat{\mathbf{Y}}\|^{2} /(n-k)σ^2=YY^2/(nk), we know ( n k ) σ ^ 2 / σ 2 ( n k ) σ ^ 2 / σ 2 (n-k) hat(sigma)^(2)//sigma^(2)(n-k) \hat{\sigma}^{2} / \sigma^{2}(nk)σ^2/σ2 has χ n k 2 χ n k 2 chi_(n-k)^(2)\chi_{n-k}^{2}χnk2 distribution, which belongs to the Gamma family. Thus, it is readily seen that E ( 1 / σ ^ ) = C / σ E ( 1 / σ ^ ) = C / σ E(1// hat(sigma))=C//sigmaE(1 / \hat{\sigma})=C / \sigmaE(1/σ^)=C/σ, where C C CCC is a known constant ( C = n k Γ ( n k 1 2 ) / ( 2 Γ ( n k 2 ) ) C = n k Γ n k 1 2 / 2 Γ n k 2 (C=sqrt(n-k)Gamma((n-k-1)/(2))//(sqrt2Gamma((n-k)/(2))):}\left(C=\sqrt{n-k} \Gamma\left(\frac{n-k-1}{2}\right) /\left(\sqrt{2} \Gamma\left(\frac{n-k}{2}\right)\right)\right.(C=nkΓ(nk12)/(2Γ(nk2)) ).
Let γ ^ = θ ^ / ( C σ ^ ) γ ^ = θ ^ / ( C σ ^ ) hat(gamma)= hat(theta)//(C hat(sigma))\hat{\gamma}=\hat{\theta} /(C \hat{\sigma})γ^=θ^/(Cσ^). Let H = X ( X T X ) X T H = X X T X X T H=X(X^(T)X)^(-)X^(T)H=X\left(X^{T} X\right)^{-} X^{T}H=X(XTX)XT denote the projection matrix. Clearly, ( I n H ) X = 0 I n H X = 0 (I_(n)-H)X=0\left(I_{n}-H\right) X=0(InH)X=0, which implies Cov ( ( X T X ) X T Y , ( I n Cov X T X X T Y , I n Cov((X^(T)X)^(-)X^(T)Y,(I_(n)-:}\operatorname{Cov}\left(\left(X^{T} X\right)^{-} X^{T} \mathbf{Y},\left(I_{n}-\right.\right.Cov((XTX)XTY,(In H ) Y ) = 0 H ) Y ) = 0 H)Y)=0H) \mathbf{Y})=0H)Y)=0. Together with the Gaussian error assumption, we know ( X T X ) X T Y X T X X T Y (X^(T)X)^(-)X^(T)Y\left(X^{T} X\right)^{-} X^{T} \mathbf{Y}(XTX)XTY and ( I n H ) Y I n H Y (I_(n)-H)Y\left(I_{n}-H\right) \mathbf{Y}(InH)Y are independent. It follows that β ^ β ^ hat(beta)\hat{\beta}β^ (any choice) and σ ^ 2 σ ^ 2 hat(sigma)^(2)\hat{\sigma}^{2}σ^2 are independent. This leads to the unbiasedness of γ ^ γ ^ hat(gamma)\hat{\gamma}γ^.
With elementary simplifications, based on basic exponential family properties, we see that T = ( X T Y , Y Y ^ 2 ) T = X T Y , Y Y ^ 2 T=(X^(T)Y,||Y-( hat(Y))||^(2))T=\left(X^{T} Y,\|\mathbf{Y}-\hat{\mathbf{Y}}\|^{2}\right)T=(XTY,YY^2) is a complete and sufficient statistic. We conclude that γ ^ γ ^ hat(gamma)\hat{\gamma}γ^ is indeed unbiased and a function of a complete and sufficient statistic, and hence it must be the UMVUE of γ γ gamma\gammaγ.
Problem 6. Let X 1 , , X 2022 X 1 , , X 2022 X_(1),dots,X_(2022)X_{1}, \ldots, X_{2022}X1,,X2022 be independent random variables with X i N ( θ i , i 2 ) , 1 i 2022 X i N θ i , i 2 , 1 i 2022 X_(i)∼N(theta_(i),i^(2)),1 <= i <= 2022X_{i} \sim N\left(\theta_{i}, i^{2}\right), 1 \leq i \leq 2022XiN(θi,i2),1i2022. For estimating the unknown mean vector θ R 2022 θ R 2022 theta inR^(2022)\theta \in R^{2022}θR2022, consider the loss function L ( θ , d ) = i = 1 2022 ( d i θ i ) 2 / i 2 L ( θ , d ) = i = 1 2022 d i θ i 2 / i 2 L(theta,d)=sum_(i=1)^(2022)(d_(i)-theta_(i))^(2)//i^(2)L(\theta, \mathbf{d})=\sum_{i=1}^{2022}\left(d_{i}-\theta_{i}\right)^{2} / i^{2}L(θ,d)=i=12022(diθi)2/i2. Prove that X = X = X=\mathbf{X}=X= ( X 1 , , X 2022 ) X 1 , , X 2022 (X_(1),dots,X_(2022))\left(X_{1}, \ldots, X_{2022}\right)(X1,,X2022) is a minimax estimator of θ θ theta\thetaθ.
Recall: If Y μ N ( μ , σ 2 ) Y μ N μ , σ 2 Y∣mu∼N(mu,sigma^(2))Y \mid \mu \sim N\left(\mu, \sigma^{2}\right)YμN(μ,σ2) and μ N ( μ 0 , σ 0 2 ) μ N μ 0 , σ 0 2 mu∼N(mu_(0),sigma_(0)^(2))\mu \sim N\left(\mu_{0}, \sigma_{0}^{2}\right)μN(μ0,σ02) then μ | Y = y N ( μ 0 / σ 0 2 + y / σ 2 1 / σ 0 2 + 1 / σ 2 , 1 1 / σ 0 2 + 1 / σ 2 ) μ Y = y N μ 0 / σ 0 2 + y / σ 2 1 / σ 0 2 + 1 / σ 2 , 1 1 / σ 0 2 + 1 / σ 2 mu|Y=y∼N((mu_(0)//sigma_(0)^(2)+y//sigma^(2))/(1//sigma_(0)^(2)+1//sigma^(2)),(1)/(1//sigma_(0)^(2)+1//sigma^(2))):}\mu \left\lvert\, Y=y \sim N\left(\frac{\mu_{0} / \sigma_{0}^{2}+y / \sigma^{2}}{1 / \sigma_{0}^{2}+1 / \sigma^{2}}, \frac{1}{1 / \sigma_{0}^{2}+1 / \sigma^{2}}\right)\right.μ|Y=yN(μ0/σ02+y/σ21/σ02+1/σ2,11/σ02+1/σ2).
Solution: We show X X X\mathbf{X}X, as an equalizer (constant risk), achieves the limit of Bayes risks under certain priors. First, consider independent priors θ i N ( 0 , τ 2 ) , 1 i 2022 θ i N 0 , τ 2 , 1 i 2022 theta_(i)∼N(0,tau^(2)),1 <= i <= 2022\theta_{i} \sim N\left(0, \tau^{2}\right), 1 \leq i \leq 2022θiN(0,τ2),1i2022. Then, the Bayes estimator δ τ δ τ delta_(tau)\delta_{\tau}δτ has the i i iii-th component (estimator of θ i θ i theta_(i)\theta_{i}θi ) being the posterior mean E τ ( θ i X ) = X i / i 2 1 / τ 2 + 1 / i 2 E τ θ i X = X i / i 2 1 / τ 2 + 1 / i 2 E_(tau)(theta_(i)∣X)=(X_(i)//i^(2))/(1//tau^(2)+1//i^(2))E_{\tau}\left(\theta_{i} \mid \mathbf{X}\right)=\frac{X_{i} / i^{2}}{1 / \tau^{2}+1 / i^{2}}Eτ(θiX)=Xi/i21/τ2+1/i2. The associated Bayes risk is R τ ( δ τ ) = i = 1 2022 i 2 1 1 / τ 2 + 1 / i 2 R τ δ τ = i = 1 2022 i 2 1 1 / τ 2 + 1 / i 2 R_(tau)(delta_(tau))=sum_(i=1)^(2022)i^(-2)(1)/(1//tau^(2)+1//i^(2))R_{\tau}\left(\delta_{\tau}\right)=\sum_{i=1}^{2022} i^{-2} \frac{1}{1 / \tau^{2}+1 / i^{2}}Rτ(δτ)=i=12022i211/τ2+1/i2. Clearly, as τ τ tau rarr oo\tau \rightarrow \inftyτ, R τ ( δ τ ) i = 1 2022 1 = 2022 R τ δ τ i = 1 2022 1 = 2022 R_(tau)(delta_(tau))rarrsum_(i=1)^(2022)1=2022R_{\tau}\left(\delta_{\tau}\right) \rightarrow \sum_{i=1}^{2022} 1=2022Rτ(δτ)i=120221=2022, which is identical to the Bayes risk of X X X\mathbf{X}X. This implies that N ( 0 , τ 2 ) N 0 , τ 2 N(0,tau^(2))N\left(0, \tau^{2}\right)N(0,τ2) with τ τ tau rarr oo\tau \rightarrow \inftyτ gives a least favorable sequence of priors and X X X\mathbf{X}X is minimax.